feat(B-0855): self-registration fires LAST + idempotent across reboots + de-duped against in-flight PRs — Otto-pushes-across-finish-line (Aaron 2026-05-27 architectural fix to B-0812)#5412
Merged
AceHack merged 1 commit intoMay 27, 2026
Merged
Conversation
…aron 2026-05-27 architectural fix to B-0812) Empirical anchor: PR #5408 auto-opened mid-install for node-0fe6eb; install then failed downstream at nixos-install --fallback bug (PR #5410 fix-fwd); registration PR orphaned for a node-id that never came up. Operator framing: "how did it register before it even rebooted? it should not register until the last step when everything comes up and if it reboots it should not register over and over... cluster should realize it's register or has a pr in flight for register and not duplicate." 4 architectural changes: 1. Move self-registration OUT of zeta-install.sh Step 6.9 INTO systemd oneshot service that fires on first boot of installed OS, AFTER network-online + creds-restore + cluster reachable 2. Idempotency: marker file + upstream check + in-flight PR check before composing new PR 3. Cluster-agent coordination via Path B (Otto-pushes-PR-across-finish-line; per Aaron's simpler-form preference); Path A (/tmp folder standard) deferred to future row 4. De-dup: idempotent branch naming + in-flight detection + comment-on-existing 7 sub-rows B-0855.1-7 enumerated. Refines B-0812 (does not replace); keeps B-0813 ArgoCD reconciliation unchanged. Composes with B-0812 + B-0813 + B-0835 (Bug 10) + B-0850 (systemd substrate) + B-0851 (persona-first scheduler) + B-0852 (cred persistence as pre-condition). PR #5408 closed substrate-honestly with cross-link to this row.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Adds a new P1 backlog row (B-0855) documenting an architectural fix for node self-registration so it fires on first boot (post-install), becomes idempotent across reboots, and avoids duplicate/in-flight registration PRs.
Changes:
- Added new per-row backlog doc for B-0855 describing the “registration fires last + idempotent + de-duped” architecture.
- Updated the backlog index to include B-0855.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| docs/backlog/P1/B-0855-self-registration-fires-LAST-post-install-post-first-boot-idempotent-across-reboots-deduped-against-in-flight-registration-prs-aaron-2026-05-27.md | New backlog row capturing the B-0855 architectural fix scope, rationale, and acceptance criteria. |
| docs/BACKLOG.md | Adds the B-0855 entry to the generated backlog index. |
AceHack
added a commit
that referenced
this pull request
May 27, 2026
…0 substrate for Ace migration trajectory (14 sub-steps; 12 declarative-input categories; substrate-anchor for B-0852/0853/0855/0856 cross-refs) (#5420) * docs(B-0854.1): zeta-install.sh step-state-machine inventory — Phase 0 substrate for Ace migration trajectory B-0854 sub-row .1 (Phase 0; smallest pure-analysis slice). Documents the EXISTING imperative bash state-machine in zeta-install.sh so the B-0854 Phase 2 declarative-Ace-manifest schema can express the same surface. Inventory covers: - Top-level entry (REPO_URL, HOST, ZETA_AUTO_CONFIRM env semantics) - Step-by-step state machine for all 14 sub-steps (1, 2, 3, 4, 5, 6, 6.5, 6.55, 6.6, 6.7, 6.8, 6.9, 6.95, 7) with inputs/outputs/side- effects/failure-modes/declarative-equivalent per step - Cross-cutting: operator-prompt accumulation count (7 prompts today; B-0852 phase-split target = 1 passphrase prompt) - Idempotency surface table — informs B-0855 architectural fix scope - 12 distinct declarative-input categories the Ace manifest must capture (Phase 2 sub-row scope) - Files-generated-during-install table mapping to B-0852.5 cred- manifest entries (6 mapped, 3 candidate-expansion items named) Snapshot date: 2026-05-27 (origin/main 70596a8; PR #5417 cosign merge). Future refreshes should re-snapshot when zeta-install.sh changes substantially. Composes with already-landed substrate-engineering arc: - B-0852 + sub-rows (cred persistence) — PR #5403/#5411/#5414 - B-0853.1 (cosign signing) — PR #5417 + fix-fwd #5419 - B-0855 (self-register architectural fix) — PR #5412 - B-0856 Path A (deferred /tmp coordination) — PR #5413 - B-0854 parent (Ace migration trajectory) — PR #5405 No code change; pure documentation. Doesn't affect ISO substrate; batches into substrate-engineering history independent of next ISO build cycle. * fix(B-0854.1): escape | inside code spans for MD056 table-column-count compliance * fix(B-0854.1): 10 Copilot accuracy corrections — verified against actual zeta-install.sh content PR #5420 Copilot review caught 10 substantive accuracy issues in the B-0854.1 inventory doc. All 10 verified against origin/main 70596a8's actual zeta-install.sh content + corrected. Corrections: - Name attribution → role-ref ("the human maintainer") - Step 1 inputs: actual `lsblk -d -p -n -o NAME,TYPE,RM,RO,TRAN` + awk filter (not made-up NAME,SIZE,MODEL,TRAN,ROTA) - Step 3 side effects: `sgdisk --zap-all` only (not `wipefs -af` too) - Step 4: actual `sgdisk` (NOT `parted`); GPT layout via -n + -t flags; whole-disk longhorn partitions on DATA_DISKS too - Step 6: `nixos-generate-config --root /mnt --force` (NOT --no-filesystems; --force overwrites existing config) - Step 6.5: no MAGIC_NUMBER (didn't exist in script); INJECT_OK gate flag; iter-4 v1 manual-config-edit fallback path - Step 6.9: SELF_REG_OK flag; documented graceful-skip path lines 731+ - nixos-install: actual line ~1004 (NOT 1096-1340); section renamed to "nixos-install (the actual build; ~line 1004)" since the prior range was wrong - Step 7: actual lines 1261-1336 (NOT 1341-1352); banner driven by GH_AUTH_OK/GH_KEY_COUNT/INJECT_OK/SELF_REG_OK (NOT MAGIC_NUMBER); conditional sections listed in declarative equivalent Resolves 10 Copilot threads on PR #5420. Root cause of the inaccuracies: original draft was written from `grep -E "^# ── Step"` summaries + recollection of script behavior, not careful per-step body reads. Discipline lesson: when authoring substrate-anchor docs claiming to inventory existing code, the read must be careful per-line, not skim-grep summary. Composes with .claude/rules/verify-existing-substrate-before-authoring.md at the inventory-substrate scope (verify-content-of-thing-being-inventoried before authoring claims about its content). --------- Co-authored-by: Lior <lior@zeta.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Architectural fix to B-0812 (iter-5.4.1 self-registration) per Aaron 2026-05-27 empirical anchor: PR #5408 auto-opened mid-install for
node-0fe6eb; install then failed at nixos-install--fallbackbug (PR #5410 fix-fwd); registration PR orphaned for a node-id that never came up.Operator: "how did it register before it even rebooted? it should not register until the last step when everything comes up and if it reboots it should not register over and over... cluster should realize it's register or has a pr in flight for register and not duplicate."
4 architectural changes
zeta-install.shStep 6.9; INTO systemd oneshot service that fires on FIRST BOOT of installed OS, AFTER network-online + creds-restore + cluster reachable~/.config/zeta/self-registered.marker) + upstream check (maintainers/<op>/cluster-nodes/<node>/node.yaml) + in-flight PR check before composing new PRComposes with
7 sub-rows enumerated
B-0855.1 NixOS module → B-0855.2 TS module + marker → B-0855.4 marker schema → B-0855.5 remove Step 6.9 → B-0855.3 Otto integration → B-0855.6 empirical test → B-0855.7 substrate landing.
PR #5408 closed
Substrate-honestly with cross-link to this row.
Test plan
🤖 Generated with Claude Code